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Abstract 

The problem of optimal actuation for channel and source coding was recently formulated and solved 
in a number of relevant scenarios. In this class of models, actions are taken at encoders or decoders, 
either to acquire side information in an efficient way or to control or probe effectively the channel state. 
In this paper, the problem of embedding information on the actions is studied for both the source and 
the channel coding set-ups. In both cases, a decoder is present that observes only a function of the 
actions taken by an encoder or a decoder of an action-dependent point-to-point link. For the source 
coding model, this decoder wishes to reconstruct a lossy version of the source being transmitted over 
the point-to-point link, while for the channel coding problem the decoder wishes to retrieve a portion 
of the message conveyed over the link. 

For the problem of source coding with actions taken at the decoder, a single letter characterization 
of the set of all achievable tuples of rate, distortions at the two decoders and action cost is derived, 
under the assumption that the mentioned decoder observes a function of the actions non-causally, strictly 
causally or causally. A special case of the problem in which the actions are taken by the encoder is also 
solved. A single-letter characterization of the achievable capacity-cost region is then obtained for the 
channel coding set-up with actions. Examples are provided that shed light into the effect of information 
embedding on the actions for the action-dependent source and channel coding problems. 
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Index Terms 

Action-dependent source coding, action-dependent channel coding, block Markov decoding, crib- 
bing, forward encoding, information embedding, source-channel separation, side information, side in- 
formation vending machine. 

I. Introduction 

The recent works [HI, [l2l study the problem of optimal actuation for source and channel 
coding for resource-constrained systems. Specifically, in |!T|, an extension of the Wyner-Ziv 
source coding problem is considered in which the decoder or the encoder can take actions 
that affect the quality of the side information available at the decoder's side. When the actions 
are taken by the decoder, the latter operates in two stages. In the first stage, based on the 
message received from the encoder, the decoder selects cost-constrained actions A that affect 
the measurement of the side information Y . This effect is modelled by a channel PY\x,A{y\x^ o)^ 
where X represents the source available at the encoder. In the second stage, the decoder produces 
an estimate of source X based on the side information Y as in the standard Wyner-Ziv problem 
(see, e.g., dSj). A similar formulation also applies when the actions are taken at the encoder's 
side. This model can account, as an example, for computer networks in which the acquisition of 
side information from remote data bases is costly in terms of system resources and thus should 
be done efficiently. We refer to this class of problems as having actions for side information 
acquisition. 

In [[2l, a related channel coding problem is studied in which the encoder in a point-to-point 
channel can take actions to affect the state of a channel. The encoder operates in two stages. In 
the first stage, based on the message to be conveyed to the decoder, cost-constrained actions A 
are selected by the encoder that affect the channel state S of the channel PY\x.s{y\x-, s) used for 
communication to the decoder in the second stage. In the second stage, the channel PY\x,s{y\'^^ s) 
is used in a standard way based on the available information about the state S (which can be 
non-causal or causal, see, e.g., [I3l|). We refer to this problem as having actions for channel state 
control. As shown in [4J, this model can be used to account for an encoder that in the first stage 
probes the channel to acquire state information. 
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A. Information Embedding on Actions 

As discussed above, optimal actuation for channel and source coding, as proposed in |[T|, [|2l, 
prescribes the selection of the actions A towards the goal of improving the performance of the 
resource-constrained communication link between encoder and decoder. This can be done by 
acquiring side information in an efficient way for source coding problems, and by controlling 
or probing effectively the channel state for channel coding problems. 

This work starts from the observations that the actions A often entail the use of physical 
resources for communication within the system encompassing the link under study. For instance, 
acquiring information from a data base requires the receiver to exchange control signals with a 
server, and probing the congestion state of a network (modelled as a channel) requires transmis- 
sion of training packets to the closest router. In all these cases, the "recipient" of the actions, 
e.g., the server or a router in the examples above, may request to obtain partial information 
about the source or message being communicated on the link. To illustrate this point, the server 
in the data base application might need to acquire some explicit information about the file being 
transmitted in the link before granting access to the server. Similarly, the router might need 
to obtain the header of the packet (message) that the transmitter intends to deliver to the end 
receiver. 

In the scenarios discussed above, the action A thus serves a double purpose: on the one hand, 
it should be designed to improve the performance of the communication link at hand as in [[T]|, 
lEl, S, and, on the other, it should provide explicit information about source or message for a 
separate decoder (the server or router in the examples above). A relevant question thus is: How 
much information can be embedded in the actions A without affecting the performance of the 
linkl Or, to turn the question around, what is the performance loss for the link as a function of 
the amount of information that is encoded in the actions Al This work aims at answering these 
questions for both the source and channel coding scenarios discussed above (see Fig. [H Fig. [2l 
Fig. [3] and Sec. |TCl). 

B. Related Work 

The interplay between communication and actuation, or control, is recognized to arise at 
different levels. As mentioned, the main theme in the papers [|2l, BH is ''control for commu- 
nication": in [[U, [|2l. Si, actuation is instrumental in improving the performance of a resource- 
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Figure 1. Source coding with decoder-side actions for information acquisition and with information embedding on actions. A 
function of the actions f(A") = {f{A\), f(An)) is observed in full ("non-causally") by Decoder 2 before decoding. See Fig. 
|4]and Fig. |5] for the corresponding models with strictly causal and causal observation of the actions at Decoder 2, respectively. 



constrained communication system. Extensions of this research direction include models with 
additional design constraints O, |l6l, with adaptive actions Q, with memory [[8||, ^ and with 
multiple terminals [|9l|[ fT4| . Somewhat related, but distinct, is the line of work including [fT5|- [[T6l| . 
in which control-theoretic tools are leveraged to design effective communication schemes. An 
altogether different theme is instead central in work such as [fTTl . [fTSlI that can be referred to as 
"communication for control". In fact, in a reversed way, in [[TtII . [[TSl (and references therein), 
communication is instrumental in carrying out control tasks such as stabilization of a plant. For 
instance, [1181 shows that an implicit message communicated between two controllers can greatly 
improve the performance of the control task. 

The idea of embedding information in the actions is related to the classical problem of 
information hiding (see, e.g., [fT9l and references therein). In information hiding, a message is 
embedded in a host data under distortion constraints. The message is then retrieved by a decoder 
that observes the host signal through a noisy channel. Note that the (host) signal onto which 
the message is embedded is a given process. Instead, in the set-up of information embedding 
on actions considered here, the (action) signal on which information is embedded is designed to 
optimize the given communication task. 

The set-up at hand is also related to the source coding model of [|20ll . in which an encoder 
communicates to two decoders and one of the decoders is able to observe the source estimate 
produced by the other. For its duality with the classical channel coding model studied in [22], 
the operation of the first decoder was referred to in EOl as cribbing. Although the problem 
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Figure 2. Source coding with encoder-side actions for information acquisition and with information embedding on actions. 
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Figure 3. Channel coding with actions for channel state control and with information embedding on actions. 



of interest here (in the source coding part) is significantly different, in that the recipient of 
the embedded information is a decoder "cribbing" the actions and not the estimates of another 
decoder, the solutions of the two problems turn out to be related, as it will be discussed. 

C. Contributions and Paper Organization 

The main contributions of this paper are as follows. 

• Decoder-side actions for side information acquisition: We first consider the model in Fig. 
[H in which the problem of source coding with actions taken at the decoder (Decoder 1) fll 
is generalized by including an additional decoder (Decoder 2). Decoder 2 is the recipient of 
a function of the action sequence and is interested in reconstructing a lossy version of the 
source measured at the encoder. A single-letter characterization of the set of all achievable 
tuples of rate, distortions at the two decoders and action cost is derived in Sec. HI] under 
the assumption that Decoder 2 observes a function of the actions non-causally (Sec. III-BI) . 
strictly causally (Sec. III-CI) or causally (Sec. III-D|) . An example is provided to shed light 
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into the effect of information embedding on actions in Sec. III-Ei 

• Encoder-side actions for side information acquisition: We then consider the set-up in Fig. 
121 in which an additional decoder observing the actions is added to the problem of source 
coding with actions taken at the encoder [[T]. Sec. Hill derives the achievable rate-distortion- 
cost region in the special case in which the channel PY\x,Aiy\x,ci) with source and action 
(X, A) as inputs and side information Y as output is such that F is a deterministic function 
of A; 

• Actions for channel control and probing: Finally, we consider the impact of information 
embedding on actions for channel control by studying the set-up in Fig. |3l which generalizes 

Specifically, a decoder (Decoder 1) is added to the model in [2], that observes a 
function of the actions taken by the encoder and wishes to decode part of the message 
that is intended for the channel decoder (Decoder 2). A single-letter characterization of the 
achievable capacity-cost region is obtained in Sec. |IVl Finally, the special case of actions 
for channel probing [4] is elaborated on with an example in Sec. IIV-CI 

II. Decoder-Side Actions for Side Information Acquisition 

In this section, we first describe the system model for the set-up illustrated in Fig. [H Fig. 
m and Fig. 151 of source coding with decoder-side actions. Then, a single letter characterization 
of the set of all achievable tuples of rate, distortions at the two decoders and action cost is 
derived under the assumption that Decoder 2 observes the actions fully (non-causally) in Sec. 
III-B[ strictly causally in Sec. III-CI and causally in Sec. III-DI An example is provided in Sec. 

A. System Model 

We present here the problem corresponding to full observation of a function of the actions 
as per Fig. \T\ We refer to this model as having non-causal action observation. The changes 
necessary to account for causal or strictly causal as illustrated in Fig. |4l and Fig. \5\ will be 
discussed in the appropriate sections later. It is remarked that this definition does not entail any 
non-causal operation, but only a larger estimation delay for Decoder 2 as compared to the causal 
cases in Fig. [Hand Fig. [51 The model is defined by the probability mass functions (pmfs) pxi^) 
and pY\Ax{y\0',x), by the function f: A ^ B, and by discrete alphabets X,y,A,B,Xi,X2, 
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as follows. The source sequence X"' is such that Xi E X for i E [l,n] is independent and 
identically distributed (i.i.d.) with pmf px{x). The Encoder measures sequence X" and encodes 
it in a message M of nR bits, which is delivered to Decoder 1. Decoder 1 receives message M 
and selects an action sequence A", where E A". The action sequence affects the quality of 
the measurement of sequence X" obtained at the Decoder 1. Specifically, given A"' = a" 
and = x", the sequence is distributed as p(?/"|a"', x") = YYi=iPy\Ax{yiWii ^i) ■ The cost 
of the action sequence is defined by a cost function A: ^ — ^[O, Amax] with < Amax < oo, 
as A(a'^) = ^"=iA(aj). The estimated sequence X" E is then obtained as a function of 
M and y". Decoder 2 observes a function of the action sequence A", thus obtaining f{A^) = 
(f (Ai ),..., f(A„)) E B". Based on f(A"), Decoder 2 obtains an estimate X^ E within 
given distortion requirements. The estimated sequences X^ for j = 1,2 must satisfy distortion 
constraints defined by functions dj{x,Xj): X x Xj [0, -Dj,max] with < Dj^jaa.^ < oo for 
j = 1,2, respectively. A formal description of the operations at encoder and decoder follows. 

Definition 1. An (n, R, Di, D2, T) code for the set-up of Fig. [T] consists of a source encoder 

h^*^); A"" ^ [1,2"^], (1) 
which maps the sequence X" into a message M ; an "action" function 

^(a). ^^^2««] (2) 

which maps the message M into an action sequence ^4"; two decoders, namely 

h^f: [1,2"^] X 3^" ^ X^, (3) 

which maps the message M and the measured sequence ¥'"• into the estimated sequence X"; 

: ^ ;t'2", (4) 

which maps the observed sequence f{A^) into the the estimated sequence X^] such that the 
action cost constraint T and distortion constraints Dj for j = 1,2 are satisfied, i.e., 

- Ve [A(a,)] < r (5) 

77, 

i=l 

1 " 

and -X^E k(^ii, Xji)] < Dj for j = 1, 2. (6) 



July 26, 2012 



DRAFT 



8 



Definition 2. Given a distortion-cost tuple {Di, D2, F), a rate R is said to be achievable if, for 
any e > 0, and sufficiently large n, there exists a (n, R, Di + e, D2 + e,T + e) code. 

Definition 3. The rate-distortion-cost function R{Di, D2, T) is defined as R{Di, D2, T) = mi{R : 
the tuple {R, Di, D2,T)is achievable}. 

In the rest of this section, for simplicity of notation, we drop the subscripts from the definition 
of the pmfs, thus identifying a pmf by its argument. 

B. Non-Causal Action Observation 

In this section, a single-letter characterization of the rate-distortion region is derived for the 
set-up in Fig. [Din which Decoder 1 observes the entire sequence f^(A") prior to decoding. 

Proposition 1. The rate-distortion-cost function R{Di, D2,T) for the source coding problem 
with decoder-side actions and non-causal observation of the actions at Decoder 2 illustrated in 
Fig. \J} is given by 

R{D^,D2,T)= min I{X- X2, A) + I{X;U\X2, A,Y), (7) 

p(x2,a,u\x), g{U,Y) 

where the mutual information is evaluated with respect to the joint pmf 

p{x, y, a, X2, u) = p{x)p{x2, a, u\x)p{y\x, a), (8) 

for some pmf p{x2, a,u\x) such that the inequalities 

E[d,{X,X,)] < D„ for J = 1,2, (9a) 
E[A(A)] < F, (9b) 
and I{X;X2,i{A)) < H{f{A)) (9c) 

are satisfied for Xi = g{U,Y) for some function g: U x y ^ Xi. Finally, U is an auxiliary 
random variable whose alphabet cardinality can be constrained as \hl\ < | A"! | ^"21 1^| + 1 without 
loss of optimality. 

At an intuitive level, in ([7]), the term I{X; X2, A) accounts for the rate needed to instruct 
Decoder 1 about the actions A to be taken for the acquisition of the side information Y, 
which are selected on the basis of the source X, and, at the same time, to communicate the 
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reconstruction X2 to Decoder 2. The additional rate I{X]U\X2, A,Y) is instead required to 
refine the description of the source X provided via (X2, A) using an auxiliary codebook U for 
Decoder 1 . Note that this rate is conditioned on the side information Y, thanks to the rate saving 
obtained through Wyner-Ziv binning. The condition (|9c1 ) ensures that, based on the observation 
of f(v4), Decoder 2 is able to reconstruct X2. The details of achievability follow as a combination 
of the techniques proposed in |[T| and [[24|. EOl . Below we briefly outline the main ideas, since 
the technical details follow from standard arguments. The proof of the converse is provided in 
Appendix A. 

Sketch of the achievability proof: We fix a pmf ([8]) and define a random variable B = f{A). 
The joint pmf p{x, y, a, X2, u, b) of variables (X, Y, A, X2, U, B) is obtained by multiplying the 
right-hand side of dH) by the te l{6=f(a)}- In the scheme at hand, the Encoder first maps 
sequence X" into a sequence G PC2 using the joint typicality criterion with respect to the 
joint pmf X2). This mapping requires a codebook of rate /(X; X2) (see, e.g., [3, pp. 62-63]). 
Given the sequence Xg , the sequence X" is further mapped into a sequence 5" G using 
the joint typicality criterion with respect to the joint pmf p{x,h\x2) where B = f{A), which 
requires a codebook of rate /(X;f(A)|X2) for each sequence X^. For later reference, we refer 
to every such codebook as a bin in the following. Note that we have one bin for every sequence 
X^. For each pair (Xg,^"), the sequence X" is mapped into an action sequence A" using 
joint typicality with respect to the joint pmf p{x, a\x2ib), which requires a codebook of rate 
/(X; A\X2,{{A)). Note that, by construction, we have that 5" = f(A") for each generated A". 
Finally, the source sequence X" is mapped into a sequence ^7" using the joint typicality criterion 
with respect to the joint pmf p{x, u\x2, a), which requires a codebook of rate /(X; U\X2, A) for 
each pair (X^, A"). 

The indices of codewords X^, i?" and are sent to Decoder 1, along with the index for the 
codeword U^. For the latter, by leveraging the side information Y"- available at Decoder 1, the rate 
can be reduced to /(X; U\X2, A, Y) by the Wyner-Ziv theorem [3, p. 280]. Decoder 2 estimates 
the sequence X^ from the observed sequence f^A^) as follows: if there is only one bin containing 
the observed sequence f(A"), then X^ equals the sequence corresponding to such bin (recall that 
each bin corresponds to one sequence X2 ). Otherwise, an error is decoded. To obtain a vanishing 

'The notation l{s} is used for the indicator function of the event S. 
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probability of error, the sequence f" (yl") should thus not lie within more than one bin with high 
probability. The probability of the latter event can be upper bounded by 2"'^^^^'^'^'^^'^^'^^-^^^^'^'^^ 
since each sequence 5" is generated with probability approximately 2^"^^^'^^)) and there are 
2n/(X;X2,f(A)) sequences fi" [20]. Therefore, as long as /(X;X2,f(A)) < H{f{A)), Decoder 1 
is able to infer the conveyed bin index with high probability. Finally, Decoder 1 produces the 
estimate through a symbol-by- symbol function as Xu = g{Ui,Yi) for i E[l,n\. □ 

Remark 1. Assume that the action Ai is allowed to be a function, not only of the message M 
as per but also of the previous values of the side information which we refer to as 

adaptive actions. Then, the rate-distortion-cost function derived in Proposition [21 can generally be 
improved. This can be seen by considering the case in which R = 0. In this case, if the actions 
were selected as per ©, then the distortion at Decoder 2 would be forced to be maximal, 
i.e., D2 = -D2,max, since the actions A cannot depend in any way on the source X. Instead, by 
selecting A as a function of the previously observed values of Y, Decoder 1 can provide Decoder 
2 with information about X, thus decreasing the distortion D2. It is noted that the usefulness of 
adaptive actions in this setting contrasts with the known fact that, in the absence of Decoder 2, 
adaptive actions do not decrease the rate-distortion function [i7J. 

C. Strictly Causal Action Observation 

The system model for the set-up in Fig. |4l is similar to the one described in Sec. III-AI with 
the only difference the decoding function for Decoder 2 a time i is given as 

h^^^:B'-'->X2, (10) 

which maps the strictly causally observed sequence f{A^^^) = (f(Ai), f(Aj_i)) into the ith 
estimated symbol X2i. 

Proposition 2. The rate-distortion-cost function R{Di, D2, for the source coding problem with 
decoder-side actions and strictly causal observation of the actions at Decoder 1 as illustrated 
in Fig. [?] is given by 

R{Di,D2,T)= min I{X; X2, A) + I{X;U\X2, A,Y), (H) 

p{x2,a,u\x), g{U,Y) 

where the mutual information is evaluated with respect to the joint pmf 

p{x, y, a, X2, u) = p{x)p{x2, a, u\x)p{y\x, a), (12) 
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Figure 4. Source coding with decoder-side actions for information acquisition and with information embedding on actions. At 
time i. Decoder 2 has available the samples f(yl'~^) = (f(^i), i{Ai-i)) in a strictly causal fashion. 



for some pmf p{x2,a,u\x) such that the inequalities 

E[d,(X,X,)] < D^, for 3 = 1,2, (13a) 

E[A(A)] < r, (13b) 

and /(X;X2,f(A)) < H{i{A)\X2) (13c) 

are satisfied for Xi = g{U,Y) for some function g: U x y ^ X\. Finally, U is an auxiliary 
random variable whose alphabet cardinality can be constrained as |W| < 1^*11^211^1 + 1 without 
loss of optimality. 

The only difference between the rate-distortion-cost function of Proposition [H with non-causal 
action observation with respect to the case with strictly causal action observation of Proposition [2] 
is the constraint (|13cl) . Recall that the latter is needed to ensure that Decoder 2 is able to recover 
the reconstruction X2. As detailed below, the strict causality of the observation of the action at 
Decoder 2 calls for a block-based encoding in which the actions carries information about the 
source sequence as observed in two different blocks, namely the current block for Decoder 1 
and the future block for Decoder 2. This additional requirement causes the conditioning on X2 
in (I13cl) . which generally increases the rate (fTTI) with respect to the counterpart (l20l) achievable 
with non-causal action observation. A sketch of the achievability proof is provided below and 
is based on the techniques proposed in [[24l. EOl (see also (EH). The proof of the converse is 
provided in Appendix B. 

Sketch of the achievability proof: We fix a pmf (fT2l) and define a random variable B = f{A). 
The joint pmf p{x, y, a, X2, u, b) of variables {X, Y, A, X2, U, B) is obtained by multiplying the 
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right-hand side of (fT2l) by the term l{5=f(a)}. We use the "Forward Encoding" and "Block Markov 
Decoding" strategy of [|24| . [|20| (see also [|2T1l ) and combine it with the coding scheme of [1]. 
The scheme operates over multiple blocks and we denote by the portion of the source 

sequence encoded in block I. The sequence f{A"-(l)) observed during block / is used in block 
/ + 1 by Decoder 2 due to the strict causality constraint. To this end, the action sequence f(A^(l)) 
produced in block I must carry information about the source sequence + 1) corresponding 
to the next block / + 1. Note that this is possible since encoder knows the entire sequence X". 
At the same time, sequence A^{1) should also perform well as an action sequence to be used by 
Decoder 1 to estimate sequence for the current block. This is accomplished as follows; 

In each block I, 2"^("^'"^2) codewords X2 ^ '^2 generated according to the pmf p{x2). 
Next, 2"^("^'"^2) bins are assigned to each codeword , where each bin contains 2"^*^"^'^*^^)l^2) 
codewords 5" G generated according to pmf p{b\x2)- For each pair (X^, S"), a codebook 
2ni{X;A\X2,fiA)) codewords E A" is generated according to the joint pmf p{x,a\x2,b). 
Finally, a codebook of 2"^(^'^I^2,a) codewords f/" E is generated according to the joint pmf 
p{x, u\x2, a). The latter codebook is further binned into a codebook of rate I{X; U\X2, A, Y) to 
leverage the side information available at Decoder 1 via the Wyner-Ziv theorem [3, p. 280]. 

For encoding, in each block I, a sequence X2 selected from the X2— codebook of block / 
to be jointly typical with the source sequence in the current block. Instead, the bin index 

describes a X2 sequence in the X2— codebook of block (/ + l)th that is jointly typical with 
the source sequence X"-{1 + 1) of the {I + l)th block. Moreover, given X^ and the bin index, 
a sequence A" is chosen such that (A",X"(/)) are jointly typical. Similarly, a sequence is 
selected for block / to be jointly typical with the sequence of X"'(l) of block /. 

Thanks to the observation of the actions, at block /+1 Decoder 2 knows the functions f(A"'{l)), 
and aims to find the bin index in which the corresponding codeword B" lies. As shown in [20J, 
this is possible with vanishing probability of error, if I{X; X2,f{A)) < H(f{A)\X2). Note that 
the conditioning in the right-hand side is due to the fact that the sequences are generated 
conditioned on the sequence X2 representing a compressed version of the source for the current 
block /. The latter does not bring any information regarding the desired sequence X"'{1 + !).□ 

Remark 2. From the proof of the converse in Appendix B, it follows, similarly to [7], that, 
adaptive actions (see Remark [l]) do not increase the rate-distortion-cost function derived in 
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Figure 5. Source coding with decoder-side actions for information acquisition and with information embedding on actions. At 
time i. Decoder 2 has available the samples f(A') — (f(Ai), ...,f{Ai)) in a causal fashion. 



Proposition |2l 



D. Causal Action Observation 

The system model for the set-up in Fig. [U is similar to the one described in Sec. III-AI with 
the only difference the decoding function for Decoder 2 is 



(14) 



which maps the causally observed sequence f(v4*) = (f(yli), f(y4j)) into the ith estimated 
symbol X2i- 

Proposition 3. The rate-distortion-cost function R{Di, D2,T) for the source coding problem 
with decoder-side actions and causal observation of the actions illustrated in Fig. \5\ is given by 



min I(X]V,A) + I(X;U\V,A,Y), 

p{v,a,u\x), gi{U,Y), g2(V,f(A)) 



where the mutual information is evaluated with respect to the joint pmf 

p{x, y, a, u, v) = p{x)p{v, a, u\x)p{y\x, a), 
for some pmf p{v,a,u\x) such that the inequalities 

E[d,{X,X^)] < D^, for J = 1,2, 
E[A(A)] < r, 
and I{X;V,f{A)) < H{i{A)\V) 



(15) 

(16) 

(17a) 
(17b) 
(17c) 
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are satisfied for Xi = gi{U, Y) and X2 = §2(^5 f(^)) ^'^^ some functions gi: U x y ^ Xi and 
g2.' V X i3 — 7- ^"2, respectively. Finally, U and V are auxiliary random variables whose alphabet 
cardinalities can be constrained as \IA\ < \X\\V\\A\ + 1 and |V| < 1^*1 +3, respectively, without 
loss of optimality. 

The difference between the rate-distortion-cost function above with causal and strictly causal 
action observation is given by the fact that, with causal action observation, Decoder 2 can use 
the current value of the function i{A) for the estimate of X2. This is captured by the fact that 
the Encoder provides Decoder 2 with an auxiliary source description V, which is then combined 
with i{A) via a function X2 = g2{V,f{A)) to obtain X2. The rate (fT5l) and the constraint (I17cl) 
are changed accordingly. The proof of the converse is given in Appendix B. 

Remark 3. As seen in Appendix B, with adaptive actions, the rate-distortion-cost function derived 
in Proposition [3] remains unchanged. 

E. Binary Example 

In this section, an example is provided to illustrate the effect of the communication require- 
ments of the additional decoder (Decoder 2) that observes a function of the actions on the system 
performance. We assume binary alphabets as X = A = y = {0,1} and a source distribution 
X ~ Bern(|). The distortion metrics are assumed to be Hamming, i.e., dj{x, Xj) = if x = Xj 
and dj{x,Xj) = 1 otherwise for j = 1,2. Moreover, as shown in Fig.[6l the side information Y at 
Decoder 1 is observed through a Z-channel for A = or an S-channel for A = 1. We assume no 
cost constraint on the actions taken by Decoder 1 (which can be enforced by choosing A{A) = A 
and r = 1), and we set f(A) = A. The example extends that of [1, Sec. II-D] to a set-up with 
the additional Decoder 2. Under the requirement of lossless reconstruction at Decoder 1, i.e., 
Di = 0, the rate-distortion-cost function R(0,D2,T = 1) with non-causal action observation is 
obtained from Proposition [T] by setting U = Xi = X, obtaining 

i?(0,D2,l)= min I{X;X2,A) + H{X\X2,A,Y), (18) 

p{x2,a\x) 

where the minimization is done under the constraints E[(i2(X, X2)] < D2 and I{X; X2, A) < 
H{A). 
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A=0 A=l 



Figure 6. The side information channel p{y\x,a) used in the example of Sec. Ill-Hl 

The minimization in (fTSi) can be done over the parameters p{a = 1,X2 = 0\x = 0) = ai, 

A A 

p{a = I, X2 = l\x = 0) = a2 and p{a = 0, £2 = l|x = 0) = 03 with < 1 and > for 

i=l 

1 = 1,2, 3, since by symmetry, we can set p{a = 0, £2 = l|x = 1) = ai, p{a = 0,X2 = 0\x = 
1) = a2 and p{a = 1,X2 = 0\x = 1) = «3 without loss of optimality. Explicit expressions can 
be easily found and have been optimized numerically. 

Fig. |7] depicts the rate-distortion function versus the distortion D2 of Decoder 2 for values 
of 5 = 0.2, 5 = 0.5 and S = 0.8. It can be seen that if the distortion D2 tolerated by Decoder 

2 is sufficiently large (e.g., D2 > 0.4 for d = 0.5), then the communication requirements 
of Decoder 2 do not increase the required rate. This can be observed by comparing the rate 
R{0,D2,T) with rate i?(0,0.5,r) corresponding to a distortion level D2 = 0.5, which requires 
no communication to Decoder 2. The smallest distortion D2 that does not affect the rate can be 
found as D2 = a2,opt + «3,opt» where a2,opt and a^^opt are the optimal values for problem (fTSi) 
with D2 = 0.5 that minimizes 02 + 03. 

We now compare the performance between non-causal action observation, as considered above, 
and strictly causal action observation. The performance in the latter case can be obtained from 
Proposition [2] and leads to (fTSi) with the more restrictive constraint (I13cl) . Fig. [8] plots the 
difference between rate-distortion function (fTSl) for the case of non-causal and strictly causal 
action observation versus S for three values of distortion, namely D2 = 0.1, 0.2, 0.3. As shown, 
irrespective of the value of distortion D2, for values of 5 = and 5=1, the performance with 
non-causal action observation is equal to that with strictly causal observation. This is due to the 
facts that: i) for 5 = 0, the side information y is a noiseless measure of the source sequence X 
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0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



Distortion 

Figure 7. Rate-distortion function R{0, D2,l) in l |18l > versus distortion D2 with tlie side information channel in Fig. [6] 
(non-causal side information). 

for both A = and A = 1 and thus there is no gain in making the actions at Decoder 1 to be 
dependent of X, and thus X2', ii) for 6=1, the side information Y is independent of the source 
sequence X given both A = and A = 1, and thus it is without loss of optimality to choose 
actions at Decoder 1 to be independent of X and X2. We can conclude that for both 5 = and 
5 = 1, causal action observation, and in fact even selecting A to be independent of X, does 
not entail any performance loss. Instead, for values < (5 < 1, it is generally advantageous for 
Decoder 1 to select actions correlated with the source X, and hence some performance loss is 
observed with strictly causal action observation owing to the more restrictive constraint (|13cl) . 
This reflects the need to cater to both Decoder 1 and Decoder 2 when selecting actions A, which 
requires description of two different source blocks. Following similar arguments, it is also noted 
that, as the communication requirements for Decoder 2 become more pronounced, i.e., as D2 
decreases, the difference between the rate-distortion function with non-causal and strictly-causal 
action observation increases. The performance with causal action observation is intermediate 
between full and strictly causal observation, and it is not shown here. 
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III. Encoder-Side Actions for Side Information Acquisition 

In the previous section, the actions controlling the quality and availability of the side infor- 
mation were taken by the decoder. In this section, following m Sec. Ill], we consider instead 
scenarios in which the encoder takes the actions affecting the side information of Decoder 1, 
as shown in Fig. [2l Specifically, the encoder takes actions A" G A"", thus influencing the side 
information available to the Decoder 1 through a discrete memoryless channel p{y\x, a). Decoder 
2 observes the action sequence to obtain the deterministic function f(A") = {f{Ai), ...,f(A„)), 
or the corresponding causal and strictly causal function, which is used to estimate the source 
sequence subject to a distortion constraint. 

An (n, R, Di, D2, T) code is defined similar to the previous sections with the difference that 
the action encoder ^ maps directly the source sequence X"' into the action sequence A", i.e., 

h^"): A'" -¥ (19) 

As discussed in [1], even in the absence of Decoder 2, the problem at hand is challenging. We 
thus focus on certain special cases, first the special case in which the side information channel 
p{y\x, a) is such that F is a deterministic function of A, i.e., Y = fy (A), and f{A) = A. This is 
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solved in Proposition IH and generalized by the following remark to the case of all deterministic 
function / for which H{fY{A)\f{A)) = 0. Following Proposition |4] is Proposition [51 where the 
case H{f{Y)\fYiA)) = is solved. 

Proposition 4. The rate-distortion-cost function R{Di, D2, T)for the source coding problem with 
encoder-side actions, non-causal, causal or strictly causal observation of the actions illustrated 
in Fig. ^with f{A) = A and Y = fy(A) is given by 

R{D,, D2, r) = m^in {/(X; Xi, U) - i/(fy(A))}+, (20) 
where the information measures are evaluated with respect to the joint pmf 

p{x, u, xi, X2, a) = p{x)p{u\x)p{xi\u, x)p{x2\u, x)p{a), (21) 
for some pmfs p{u\x), p{xi\u, x), p{x2\u, x) and p{a) such that the inequalities 

E[d,{X,X,)] < D„ for J = 1,2, (22a) 

E[A(A)] < r, (22b) 

I{X;U) < H{fy{A)) (22c) 

and I{X;X2\U) < //(A|fy(A)) (22d) 

are satisfied. Finally, U is an auxiliary random variable whose alphabet cardinality can be 
constrained as \V(\ < \X\\Xi\\X2\ + 3 without loss of optimality. 

Remark 4. The results above generalizes a number of known single-letter characterizations. 
Notably, if D2 = -D2,maxj so that the distortion requirements of Decoder 2 are immaterial to the 
system performance, the result reduces to [1 , Theorem 7]. Moreover, in the special case in which 
A = {Ao,A2), Y = Aq, R = Ri, \Ao\ = 2^", \A2\ = 2^\ the model coincides with the lossy 



Gray-Wyner problem 



As detailed in the proof below. Proposition |4] establishes the optimality of separate source- 
channel coding for the set-up in Fig. [2] under the stated conditions. In particular, the encoder 
compresses using a standard successive refinement source code in which U represents the coarse 
description and Xi, X2 two independent refinements. The indices of the coarse description U and 

^Note that here 2^" and 2^^ are constrained to be integers. 
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of the refined description X2 are sent on the degraded (deterministic) broadcast channel with input 
A and outputs {A,t(A)) using superposition coding. Reliable compression and communication 
is guaranteed by the two bounds (|22cl) - (|22d| ). A further refined description Xi is produced for 
Decoder 1, and the corresponding index is sent partly over the mentioned broadcast channel and 
partly over the link of rate R, leading to the rate (|20|) . Details of the achievability proof can be 
found below, while the proof of the converse is given in Appendix C. 

Remark 5. Following the discussion above, specializing Proposition H] to the case R = shows 
the optimality of source-channel coding separation for the lossy transmission of a source over a 
deterministic degraded broadcast channel (see [3, Chapter 14] for a review of scenarios in which 
the optimality of separation holds for lossless transmission over a broadcast channel). 

Sketch of the achievability proof: As anticipated above, achievability uses the ideas of a source- 
channel coding separation, successive refinement and superposition coding. We only describe the 
outline, as the rigorous details can be derived based on standard techniques We start with 
the case of non-causal action observation at Decoder 2. Note that the deterministic channel with 
input A and outputs A (to Decoder 2) and fy(A) (to Decoder 1) is not only deterministic but 
also degraded (31 Chapter 5]. This channel is used to send a common source description of rate 
Ri to both the decoders and a refined description of rate R2 to Decoder 2 only. To elaborate, 
fix the pmfs p{u\x), p{xi\u, x), p{x2\u, x) and p{a). Generate a codebook of 2'^^^^''^') sequences 
[/" i.i.d. with the pmf p{u) and, for each U"^ sequence, generate a codebook of 2"^(^'"^il^) X" 
sequences i.i.d. with pmf p{xi\u) and a codebook of 2"^('^'''^2|c/) sequences i.i.d. with pmf 
p{x2\u). Given a source sequence X", the encoder finds a jointly typical f/" codeword, and 
then a codeword X" jointly typical with (X", f/") and similarly for X^. Using source-channel 
separation on the broadcast "action" channel described above, the index from the [/-codebook 
and a part of the index from the Xi -codebook, of rate r, is described to both decoders, and the 
index from X2-codebook is described to Decoder 2 as its private information. Thus, we have 
the inequalities 



The capacity region of the broadcast channel is given by the conditions (SI Chapter 9], Ri < 



i?i > /(X;f/)+r 



(23a) 



and /?2 > I{X;X2\U). 



(23b) 
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i7(fy(A)) and Ri + R2 < H{A), and thus the following rates are achievable 



(24a) 



and R2 < H{A\fY{A)). 



(24b) 



Finally the remaining part of the index of codeword X" is sent through the direct rate R, leading 



to the condition 



R> I{X;Xi\U) -r. 



(25) 



Combining (l23l) . (l24l) and (|25l) . and using Fourier-Motzkin elimination we obtain 



R > I{X;X,,U)-H{fy{A)) 



(26) 



and (|29c| )- (|22dl) . The distortion and cost constraints are handled in a standard manner and hence 
the details are omitted. 

It remains to discuss how to handle the case of causal or strictly causal action observation. 
Given the converse result in Appendix C, it is enough to show that (|20l) -(l22l) is achievable also 
with strictly causal and causal action observation. This can be simply accomplished by encoding 
in blocks as per achievability of Proposition [2] and Proposition [3l Specifically, in each block 
the encoder compresses the source sequence corresponding to the next block. Decoder 2 then 
operates as above, while Decoder 1 can recover all source blocks at the end of all blocks. □ 

Remark 6. The scenario solved above is when the action observation is perfect, i.e., f(A) = A. 
The result also carries verbatim for the more general case where f{A) is a generic function 
as long as H(fY{A)\f{A)) = 0. The expressions of the rate region remain the same as in the 
proposition above except that A is replaced by f{A). 

Proposition |4] characterizes the optimal performance for the case when Decoder 2 has a 
better information about the actions taken by the encoder than Decoder 1 in the sense that 
H{fY{A)\f{A)) = 0. We note here that a similar characterization can be given also for the dual 
setting in which H(f(A)\fY{A)) = so that Decoder 1 has the better observation about the 
actions. 

Proposition 5. The rate-distortion-cost function R{Di, D2, ^)for the source coding problem with 
encoder-side actions, non-causal, causal or strictly causal observation of the actions illustrated 
in FigMwith H{{{A)\{y{A)) = 0, is given by 
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i?(Di,D2,r)= mill {/(X;Xi,X2)-//(fy(A))}+, (27) 

p(a), p(xi,£2|a;) 

where the information measures are evaluated with respect to the joint pmf 

p{x,xi,X2,a) = p{x)p{xi,X2\x)p{a), (28) 

such that the following inequalities are satisfied, 

E[d,(X,X,)] < D,,/orj = 1,2, (29a) 
E[A(A)] < r, (29b) 
< H{i{A)). (29c) 

The converse follows similarly as that for Proposition |4] where instead of U in the con- 
verse we use X2, as knowing = fy(y4") implies knowing f(v4"), due to the assumption 
H{f{A)\fY{A)) = 0. We just outline the achievability for the non-causal case (the achievability 
for strictly causal and causal case uses block coding ideas as in Proposition IH). A successive 
refinement codebook is generated by drawing 2"^(^'^2) codewords X^, and, for each codeword 
X^, a number 2"^("^'^il^2) Qf codewords X". As for Proposition HI the indices of these two 
codebooks obtained via standard joint typicality encoding are sent through the degraded broadcast 
channel p{y,b\a) = l{y=iY(A),b=i(A)}- Splitting the rate for the index of codeword so that a 
rate R is sent over the direct link to Decoder 1, reliability of compression and communication 
over the "action" broadcast channel is guaranteed if 

/(X;X2) < H{i{A)) (30) 
/(X;X2) + /(X;Xi|X2)-i? < if(f(A), fy(A)) = //(fy(A)), (31) 

where the latter inequality implies R > /(X; Xi, X2) — i/(fy (A)). The proof is concluded using 
the usual steps. □ 

IV. Actions for Channel State Control and Probing 

In this section, we consider the impact of information embedding on actions for the set-up 
of channel coding with actions of |l2l. To this end, we consider the model in Fig. [31 in which 
Decoder 1, based on the observation of a deterministic function of the actions, wishes to retrieve 
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part of the information destined to Decoder 2. Note that for simplicity of notation here the 
additional encoder that observes the actions is denoted as Decoder 1, rather than Decoder 2 as 
done above. Also, we emphasize that in the original set-up of L2J, Decoder 1 was not present. 

A. System Model 

The system is defined by the pmfs p{x), p{y\x, s, a), p{s\a), function f: A ^ B and by 
discrete alphabets X,A, B, S, and y. Given the messages (Mi,M2), selected randomly from 
the set Ml x M2 = [1,2"-^!] x [1,2"^^]^ ^n action sequence A" G is selected by the 
Encoder. Decoder 1 observes the signal = f(A") as a deterministic function of the actions, 
and estimates message Mi. Note that the notation here implies a "non-causal" observation of 
the actions, but it is easy to see that the results below hold also with causal and strictly causal 
observation of the actions. Moreover, the state sequence S" G 5" is generated as the output of 
a memoryless channel p{s\a) and we have p{b'^, s'^\a'^) = YYi=iPi^i\'^i)'^{bi=t{ai)} for an action 
sequence = a". The input sequence X" G A*" is selected on the basis of both messages 
(Ml, M2) and of the state sequence S*" by the Encoder. The action sequence A^ and the input X" 
have to satisfy an average cost constraint defined by a function ^ : Ax X [0, 00), so that the 
cost for the input sequences a" and is given by 7(0", x") = ^ Yl'i=i 7('^n ^i)- Given X" = x", 
S'"' = s" and A^ = a", the received signal is distributed as s", a") = HILi PiUil^ij •^i? C'i)- 

Decoder 2, having received the signal y", estimates both messages (Mi, M2). 

The setting includes the semi-deterministic broadcast channel with degraded message sets ||25l 
(see also [|3j Ch. 8]) as a special case by setting X to be constant and Y = S, and the channel 
with action-dependent states studied in [2 J for _Ri = 0. 

Definition 4. An (n, Rq, Ri, T, e) code for the model in Fig. [3] consists of an action encoder 

h^^^: MixM2^A'', (32) 
which maps message (Mi,M2) into an action sequence A"; a channel encoder 

h(^): A^i X A^2 X 5" ^ A*", (33) 
which maps message (Mi, M2) and the state sequence 5*" into the sequence X"; two decoding 
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functions 

hi'^^: i3" ^ TWi, (34) 
and h^'^^ y" ^ A^i x M2, (35) 

which map the sequences and into the estimated messages Mi and (Mi, M2), respectively; 
such that the probability of error in decoding the messages (Mi, M2) is small, 

Pr[hf)(S")^Mi]<e, (36) 
and Pr[hf (F") ^ (Mi , M2)] < e, (37) 

and the cost constraint is satisfied, i.e., 

1 

- J]e[7(A„X,)] < r + e. (38) 



n . 



Given a cost F, a rate pair {Ri,R2) is said to be achievable for a cost-constraint F if, for 
any e > and sufficiently large n, there a exists a {n, Ri, R2,T,e) code. We are interested in 
characterizing the capacity-cost region C(F), which is the closure of all achievable rate pairs 
(i?i, R2) for the given cost F. 

B. Capacity-Cost Region 

In this section, a single-letter characterization of the capacity-cost region is derived. 

Proposition 6. The capacity-cost region C(F) for the system in Fig. \3\is given by the union of 
all rate pairs (i?i,i?2) such that the inequalities 

Ri < H{f{A)) (39a) 
andi?i + i?2 < I{A,U;Y) - I{U;S\A), (39b) 
are satisfied, where the mutual inform^ations are evaluated with respect to the joint pmf 

p{a, s, M, X, y) = p{a)p{s\a)p{u\s, a)l{^=g^u,s)}Piy\x, s, a), (40) 
for some pmfs p{a), p{u\s, a) and function g: U x S ^ X such that 

E[j{A,X)] < F. (41) 
Finally, we can set |W| < + 1 without loss of optimality. 
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Figure 9. Channel coding with actions for channel state probing and with information embedding on actions. 



The proof of converse is an immediate consequence of cut-set arguments and of the proof of 
the upper bound obtained in [2, Theorem 1]. Specifically, inequality (|39al) follows by considering 
the cut around Decoder 1, while the inequality (|39bl) coincides with the bound derived in ^ 
Theorem 1] on the rate that can be communicated between the Encoder and Decoder 2 with no 
regards for Decoder 1^. The achievability requires rate splitting, superposition coding and the 
coding strategy proposed in [2, Theorem 1]. A sketch of proof of the achievability is relegated 
to Appendix D. 

C. Probing Capacity 

Here we provide an example that illustrates the effect of the communication requirements 
of the action-cribbing decoder on the system performance. Consider the communication system 
shown in Fig. |9l where the states is known to Decoder 2. We further assume binary actions, 
such that, if y4 = 1, the channel encoder observes the state S, and if ^ = 0, it does not obtain 
any information about S. We model this problem by defining the state information available at 
the encoder as Se = \i{S, A), where u{S, 1) = S and u{S, 0) = e, where represents as "erasure" 
symbol. Following ||4l, we refer to this problem as having a "probing" encoder. 

The channel encoder maps the state information 5" and messages Mi , M2 into a code- 
word X" (see Fig. |9l). Moreover, two cost constraints, namely ^ Xir=i ^ — and 

^The cardinality constraints follow from |2] Theorem 1] 
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n Sr=iE [7a; (-'^j)] < ^Tc imposcd for given action input cost functions 7^ : ^ — )■ [0, Aa ^ax] 
and : X ^ [0,Ax,max] with < Aa,max < 00 and < Ax,max < 00, respectively. In gl 
Theorem 1], a correspondence was proved between the set-up of a probing encoder and that of 
action dependent states. Using ^ Theorem 1] and Proposition [6l we can easily obtain that the 
capacity-cost region C{Ta,Tx) for the system in Fig. |9]is given by the union of all rate pairs 
(-Ri,i?2) such that the inequalities 

Ri < H{A\Q) (42a) 
andi?i + i?2 < I{X-Y\S,Q), (42b) 
are satisfied, where the mutual informations are evaluated with respect to the joint pmf 

p{q, a, s, Se, X, y) = p{q)p{a\q)p{s)l{s,=n{s,a)}P{x\Se, a, q)p{y\x, s), (43) 

for some pmfs p{q), p{a\q), p{x\se,a,q) such that E[7a(y4)] < Ta and E[7a,(X)] < Tx- 

We now apply (I42al) - (|42bl) to the channel shown in Fig. |9] in which alphabets are binary 
X = y = S = {0,1}, S is a Bem(l — e) variable for < e < 1 and the channel is a binary 
symmetric with flipping probability 0.5 if = ("bad" channel state) and if = 1 ("good" 
channel state). 

To evaluate the maximum achievable sum-rate Ri + R2 for a given rate Ri, we define Fi[A = 
1] = 7, Pr[X = = l,A = 1] = pi and Pi[X = l\Se = e,A = 0] = p2, and we set 
Pr[X = l|5'e = 0, ^4 = 1] =0 without loss of optimality. The maximum sum-rate i?i + R2 for 
a given rate Ri is then obtained from (I42bl) by solving the problem 

R1 + R2 = max 7(l-e)//(pi) + (l-7)(l-e)i7(p2), (44) 

0<P1,P2,7<1 

under the constraint E[X] = pi7(l - e) +^2(1 - 7) < ^x, E[A] = 7 < and H{A) = H{j) > 
Ri. Note that the last constraint imposes that the rate achievable by the Decoder 1 is larger than 
Ri as per (l42al) . 

The sum-rate in (l44l) is shown in Fig. [10] for e = 0.5, = 1 and different values of Ri. It can 
be seen that, for sufficiently small values of the cost constraint Tx , increasing the communication 
requirements, i.e., Ri, of the Decoder 1, reduces the achievable sum-rate i?i + R2. This is due 
to the fact that increasing Ri requires to encode more information in the action sequence, which 
in turn reduces the portion of the actions that can be set to A = 1, i.e., Fr[A = 1]. As a result. 
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the encoder is less informed about the state sequence and thus bound to waste some power on 
bad channel states. 




0.1 0.15 0.2 0.25 0.3 0.35 0.4 



Figure 10. Sum-rate Ri + R2 versus the input cost constraint Vx for values of 7?i = 0, i?i = 0.5 and 7?i = 0.9. 

Remark 1. The communication requirements of Decoder 1 need not necessarily affect the system 
performance. For instance, consider the example 1 in |l4l Sec. V.A], which includes a probing 
encoder as in Fig.|9]but transmitting over a different channel. There, it turns out that it is sufficient 
to have Pr[j4 = 1] > 0.2 in order to achieve the same performance that can be achieved with full 
encoder channel state information. Therefore, the additional constraint on the rate of Decoder 1 
(I42al) . namely Ri > H(A), does not affect the sum-rate achievable in this example for any rate 

Ri e [0,1]. 

V. Concluding Remarks 

There is a profound interplay between actuation and communication in that both actuation can 
be instrumental to improve the efficiency of communication, and, vice versa, communication, 
implicit or explicit, can provide an essential tool to improve control tasks. This work has focused 
on the first type of interplay, and has investigated the implications of embedding information 
directly in the actions for the aim of communicating with a separate decoder. The communication 
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requirements of this decoder are generally in conflict with the goal of improving the efficiency 
of the given communication link. This performance trade-off has been studied here for both 
source and channel coding. The results provided in the paper allow to give a quantitative 
answer to the questions posed in Sec. II-AI regarding the impact of the requirements of action 
information embedding on the system performance. They also shed light into the structure of 
optimal embedding strategies, which turns out to be related, for the source coding model, with 
the strategies studied in [|20l, [|24||. 

The investigation on the theme of information embedding on actions can be further developed 
in a number of directions, including models with memory |[8l, |l6l and with multiple terminals 
im, [fT4|. [[TT]|. We also note that results akin to the ones reported here can be developed assuming 
causal state information at the decoder for source coding problems or causal state information 
at the transmitter. 

Appendix A: Proof of Proposition [I| 

Here, we prove the converse part of Proposition [TJ For any (n, R, Di + e, D2 + e,T + e) code, 
we have 

nR > H{M) 

= I{M;X\Y'') 

= - i7(x",y"|M) 

n 

= ^//(Xi) + H{Yi\Y'~\X'') - H(Yi\Y'-\M) - H{Xi\X'-\ M,Y'') 

i=l 

(a) ^ 

> ^if(Xi)+i7(y,|r-\ X", A'')-H(Y,\Y'~\ M, A'')-H{X,\X''\ M, F", yl")(45) 
1=1 

n 

Y,H{X,)+H{Y,\Y'-\X\ A", X2^)-H{Y.i\Y'-\ M, A", X2.) (46) 

i=l 

(c) " 

> 5^if(X,) +H{Y,\X„ A, X2^) -H{Y,\A, X2^) - H{X4U„ F„ A, X2.), (47) 

i=l 
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where (a) because A"' is a function of M and since conditioning reduces entropy; (6) fol- 
lows since X2i is a function of A"-; and (c) follows because we have the Markov relation 

Y~{Xi,Ai,X2i)—{X''\\A''\'), by defining f/, = {M, X'-\Y''\\ A'-^) and since conditioning 
decreases entropy. 

Defining Q to be a random variable uniformly distributed over [l,n] and independent of all 
the other random variables and with X = Xq, Y = Yq, A = Aq, Xi = Xiq, X2 = X2Q and 
U=iUQ,Q), from m 

we have 

nR > HiX\Q)+HiY\X,A,X2,Q)- HiY\A,X2,Q)- HiX\U,Y,A,X2,Q) 
> H{X) +H{Y\X,A,X2)- H{Y\A,X2) - H{X\U,Y,A,X2) 
= I{X;U,Y,A,X2)-I{Y;X\A,X2) 
= I{X;A,X2) + I{X;U\Y,A,X2), 

where in (a) we have used the fact that X" is i.i.d., conditioning reduces entropy and by the 
problem definition. Moreover, we have the following chain of inequalities 

n 

//(f(A")) < Y,HifiA))=nH{f{A)\Q)<nH{f{A)), (48) 

i=l 

where the last inequality follows since conditioning reduces entropy, and 

i7(f(A")) > /(f(A");X") 

n 

= ^/(f(A");X,|X-^) 

i=l 

n 

= Y,I{f{A-),X2^■,X,\X^'') 

i=l 
n 

= ^/(f(A"),X2„X^-^X,) 
i=l 

(a) 

> J2^{f{A,),X2^■,X,) 

i=l 

= n{H{X\Q)-H{X\f{A),X2,Q)) 

> n{H{X)-H{X\f{A),X2)) 

= n{I{X;fiA),X2)), (49) 
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where (a) follows by the chain for mutual information and since mutual information is non- 
negative; and (b) follows since X" is i.i.d. and due to the fact that conditioning decreases 
entropy. Combining (|49l ) and (|48l) . we obtain the inequality 

I{X-f{A),X,) < H{f{A)). (50) 

We note that the defined random variables factorizes as ([8]) since we have the Markov chain 
relationship {Xi,X2, U) — {A, X) — Y by the problem definition and that X2 is a function g(f/, Y) 
of U and Y by the definition of U . Moreover, from cost and distortion constraints Q-®, we 
have 

1 " 

Dj + e>- Y^E[d,j{Xi, X,i)] = E[dj{X, X,)], for 3 = 1, 2, (51a) 

i=l 
1 " 

andr + e> - Ve[A(A,)] =E[A(A)]. (51b) 
n ^-^ 

The cardinality constraint on the auxiliary random variable U is obtained as follows us- 
ing Caratheodory's theorem as in Appendix C]. Note that we can write I{X] X2, A) + 
I{X-U\X2.A,Y) = H{X) - H{X\X2,A) + H{X\X2,A,Y) - H{X\X2, A,Y,U). Now, to 
preserve the joint distribution of variables {X,X2,A), and thus the distribution of all variables 
(X,X2,A,Y) and the terms H{X), H{X\X2,A) and H{X\X2,A,Y) (since p{y\x,a) is fixed), 
the set U should have |A:'||A'2||^| — 1 elements; moreover, we need one further element to 
preserve the conditional entropy H{X\X2, A,Y, U) and one for the distortion E[di{X,Xi)\. 

Appendix B: Proof of Proposition [2] and Proposition [3] 

Here, we first prove the converse part of Proposition [2] and then describe the different steps 
needed to prove Proposition [3l The first part of the converse follows the same steps as in 
Appendix A. However, we note that in (l45l) and (|46l) . we can write A' instead of A\ without 
changing the following steps. This is due to the strictly causal dependence of X2i on the action 
sequence which is used in (|46|) . This allows to validate the claim in Remark [21 To prove the 
constraint in (I13cl) . we have the following chain of inequalities 

H{f{A-)) = 5^//(f(A)|f(A*-')) = $^i^(f(A)|X2,) <n/f(f(A)|l2). (52) 

i=l i=l 
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Moreover, we can write 

i/(f(A")) > J(f(A");X") 

n 



1=1 

n 



^J(f(A"),X2.;X,|X'-i) 
1=1 

n 



1=1 



(a) 



> 5^/(f(A),X2.;X- 



1=1 



= n(i/(X|g)-//(X|f(A),X2,Q)) 
> n{H{X)-H{X\i{A),X2)) 

= n{I{X-f{A),X2)), (53) 

where (a) follows by the chain for mutual information and since mutual information is non- 
negative; and (6) follows since X" is i.i.d. and due to the fact that conditioning decreases 
entropy. Combining (|53l) and (|52|) . we obtain the inequality (I13cl) . We note that the joint pmf 
of the defined random variables factorizes as (fT2l) since we have the Markov chain relationship 
{Xi,X2, U) — X) — Y by the problem definition and that Xi is a function g(f/, F) of U and 
F by the definition of U as in Appendix A. The distortion, cost and cardinality constraint are 
obtained as in Appendix A. 

The converse for Proposition [3] follows from similar steps by defining Vi = f(yl*^^) and noting 
that X2i is a function of Vi and f{Ai). 

We bound the cardinality of the auxiliary random variables U and V for Proposition [3] using 
Il3l Appendix C]. The bounds for U in Proposition [2] follow in the same way. Note that we can 
write I{X; V, A) + I{X; U\V, A, Y) = H{X) - H{X\V, A) + H{X\V, A, Y) - H{X\V, A, Y, U). 
Starting with V, the alphabet V should have lA"! — 1 elements to preserve the distribution 
p{x) and hence H{X), one element to preserve —H{X\V, A) + H(X\V, A, Y), two elements to 
preserve the distortion constraints and and one more to preserve the condition I{X; V,f{A)) < 
H(f(A)\V). As for U, just as in Appendix A, U should have |A:'||V||^| — 1 elements to preserve 
the joint distribution p(x,v,a) (which preserves the joint distribution p{x,a,v,y) and hence 
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H{X), H{X\V, A), H{X\V, A, Y)), one element to preserve H{X\V, A, Y, U) and one more to 
preserve the distortion constraint of Decoder 1. 

Appendix C: Proof of Propositions] 

Here, we prove the converse part of Proposition IH To establish the converse, it is sufficient to 
consider the case of non-causal action observation, as done in the following. For any (n, i?, Di + 
e, D2 + e, r + e) code, define the auxiliary variable Ui = {Y", X*"^), and Q as a random time 
sharing variable uniformly distributed in the interval [1, n] independent of {X, f/, Xi, X2, A, Y). 
We then have. 



Also, we can write 



H{Y-) = Y^HinY'-': 



i=l 

n 

i=l 



i=l 



= nHifyiAQ)\Q) 

< H{fy{AQ)). (54) 



n 



(a) 



1=1 

n 



n 



1=1 



= nI{XQ;UQ\Q) 

I{Xq;Uq,Q), (55) 
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along with 



and 



Furthermore, we have 



and 



i=l 
n 

i=l 
n 

= 5^i/(A|fy(A)) 

i=l 



= n(i/(>lQ)|fy(^Q),Q) 

< i/(^Q|fy(AQ)), (56) 



if(A"|y") > J(X";A"|y") 

/(X";A",X2"|y") 

n 

i=l 

n 

= J]/(X,;X2,,|t/,) 



n/(XQ;X2,Q|?7Q,Q). (57) 



H{Y'',M) < ^H{Yi) + nR 



i=l 



< nH{YQ)+nR, (58) 



H{Y'',M) > I{X'^;Y'^,M) 

/(X";Xf,y",M) 

n 

> 5^/(X,;Xi„F«|X-^) 

i=l 

n 
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n 
i=l 

= n{XQ;X,^Q,UQ,Q), (59) 

where (a) follows from the independence of Xi and X*~^; (b) follows from the independence 
of Q from all other random variables; (c) follows from the fact that X2 is a function of A"; 
and (d) follows from the fact that X'^ is a function of (M,y"). Defining U = {Uq,Q) along 
with X = Xq, Y = Yq, A = Aq, Xi = Xiq, X2 = X2Q and combining (EH), 
(l58l) and (l59l) . we obtain the rate region inequalities as mentioned in the proposition. Note that 
the joint distribution of the random variables (X,Y, A, Xi, X2) established above factorizes as 
p{x)p{u, ) but can be restricted only to pmfs factorizing as in (|21l) . This is because the 

information measures in (l20l)- (|22l) only depends on the marginals tt, xi), p(a) and p{x, u, £2). 
Distortion and cost constraints are handled in the standard manner [j3]|. 

We bound the cardinality of the auxiliary random variables U using [[3l Appendix C]. The 
set U should have jA"! 1^*11 |A'2| — 1 elements to preserve the joint distribution p{x,xi,X2), one 
element to preserve the Markov chain Xi — U — X2, and three elements to preserve H{X\Xi,U), 
H{X\X2,U) and H{X\U) . 

Appendix D: Sketch of Proof of Achievability for Proposition [6] 
We will prove below that the following rate region is achievable 

Ri < H{i{A)) (60a) 
R1 + R2 < I{A,U;Y) - I{U;S\A), (60b) 
R2 < IiA;Y\f{A)) + IiA,U;Y\A)-IiU;S\A), (60c) 

for a given joint distribution as in (l40l) . Assuming now that this rate region is achievable, we 
show that the rate region (|39l ) is also achievable. Region (|39l) is larger than (I6OI) owing to the 
absence of the inequality (I60cl) . The two regions are illustrated in Fig. [TT]for a given choice of 
the distribution (|40l) . with region (l60l) in solid lines and (|39l) in dashed lines. We now argue that 
the achievability of region (I6OI) (solid lines) implies the achievability of region (|39l) (dashed lines) 
as well, by following the same arguments as in ll25l . Specifically, we observe that, if {Ri, R2) 
is achievable with some scheme, then — t,R2 + t) is also achievable for all < t < 
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This is due to the fact that, if the rate pair R2) is achievable, then some of the rate of the 
common message Mi can always be transferred to the private message M2 for Decoder 2 to 
achieve — t, R2 + 1) if < t < Ri. It follows immediately that all the points on the dashed 
line in Fig. [TT] are also achievable. 

The discussion above allows us to conclude that concludes that, if region (|60l) is achievable, 
then the desired rate region (|39l ) is also achievable. We now focus on proving the achievability 
of (l60l) . To this end, we combine superposition coding and the technique proposed in [2J. Fix the 
joint distribution as in (|40|) . We first generate the codebook b"'(mi), mi E [1 : 2"^^], i.i.d. with 
pmf p{b). Next, we generate a superimposed codebook for each b" of {mi, 1712) codewords, 
771-2 G [1 : 2"^2], i.i.d. with pmf p{a\b). For every a" sequence, a codebook of u'^{mi,m2, j) 
sequences is generated, j E [I : 2"^], i.i.d. with pmf p{u\a). 

To encode messages (mi, 7712), Encoder selects the codeword a"(7ni, 7712), and chooses a 7i" 
codeword jointly typical with action and state sequence, which requires R > I(U; S\A). Then 
Xi = g{ui,Si) is then sent through the channel. Decoder 1 decodes the message 7711 correctly 
if Ri < H{B). Decoder 2 looks for the unique pair of messages (rrii, 777,2) such that the tuple 
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(y", 6"(mi), a"(mi, m2), M"(mi, m2, j)) is jointly typical for some j E [1 : 2"'^]. This step is 
reliable if Ri + R2 + R < I{A, U; Y) and R2 + R < I{U, A] Y\B) = I{A; Y\B) + I{U; Y\A). 
Using Fourier- Motzkin elimination to eliminate rate R leads to the bounds (|60l ). 
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